Spatio-textual similarity joins

نویسندگان

  • Panagiotis Bouros
  • Shen Ge
  • Nikos Mamoulis
چکیده

Given a collection of objects that carry both spatial and textual information, a spatio-textual similarity join retrieves the pairs of objects that are spatially close and textually similar. As an example, consider a social network with spatially and textually tagged persons (i.e., their locations and profiles). A useful task (for friendship recommendation) would be to find pairs of persons that are spatially close and their profiles have a large overlap (i.e., they have common interests). Another application is data de-duplication (e.g., finding photographs which are spatially close to each other and high overlap in their descriptive tags). Despite the importance of this operation, there is very little previous work that studies its efficient evaluation and in fact under a different definition; only the best match for each object is identified. In this paper, we combine ideas from state-of-the-art spatial distance join and set similarity join methods and propose efficient algorithms that take into account both spatial and textual constraints. Besides, we propose a batch processing technique which boosts the performance of our approaches. An experimental evaluation using real and synthetic datasets shows that our optimized techniques are orders of magnitude faster than baseline solutions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity Search on Spatio-Textual Point Sets

User-generated content on the Web increasingly has a geospatial dimension, opening new opportunities and challenges in location-based services and location-based social networks for mining and analyzing user behaviors and patterns. The applications of such analysis range from recommendation systems to geo-marketing. Motivated by these needs, querying and analyzing spatio-textual data has receiv...

متن کامل

Similarity-based Queries for XML Databases Using ELIXIR

Existing XML query languages do not support ranked query results based on textual similarity. For example, Fig. 1 shows an XML database containing books and CDs. We are interested in information-retrieval-style queries such as “order the items by similarity to the phrase ‘traditional Ukrainian”’ or “find books and CDs with similar titles. Unlike similar efforts, our expressive and efficient lan...

متن کامل

SEAL: Spatio-Textual Similarity Search

Location-based services (LBS) have become more and more ubiquitous recently. Existing methods focus on finding relevant points-of-interest (POIs) based on users’ locations and query keywords. Nowadays, modern LBS applications generate a new kind of spatio-textual data, regions-of-interest (ROIs), containing region-based spatial information and textual description, e.g., mobile user profiles wit...

متن کامل

Text Joins for Data Cleansing and Integration in an RDBMS

An organization’s data records are often noisy because of transcription errors, incomplete information, lack of standard formats for textual data or combinations thereof. A fundamental task in a data cleaning system is matching textual attributes that refer to the same entity (e.g., organization name or address). This matching can be effectively performed via the cosine similarity metric from t...

متن کامل

STEWARD: demo of spatio-textual extraction on the web aiding the retrieval of documents

A spatio-textual sear h engine, termed \STEWARD" is demonstrated where do ument similarity is based on both the textual similarity as well as the spatial proximity of the lo ations in the do ument to the spatial sear h input. STEWARD's performan e is enhan ed by the presen e of a do ument tagger that is able to identify textual referen es to geographi al entities. The userinterfa e of STEWARD p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2012